Blog

How ‘dark LLMs’ produce harmful outputs, despite guardrails – Computerworld

3 weeks ago

1 minute read

And it’s not hard to do, they noted. “The ease with which these LLMs can be manipulated to produce harmful content underscores the urgent need for robust safeguards. The risk is not speculative — it is immediate, tangible, and deeply concerning, highlighting the fragile state of AI safety in the face of rapidly evolving jailbreak techniques.”

Analyst Justin St-Maurice, technical counselor at Info-Tech Research Group, agreed. “This paper adds more evidence to what many of us already understand: LLMs aren’t secure systems in any deterministic sense,” he said, “They’re probabilistic pattern-matchers trained to predict text that sounds right, not rule-bound engines with an enforceable logic. Jailbreaks are not just likely, but inevitable. In fact, you’re not ‘breaking into’ anything… you’re just nudging the model into a new context it doesn’t recognize as dangerous.”

The paper pointed out that open-source LLMs are a particular concern, since they can’t be patched once in the wild. “Once an uncensored version is shared online, it is archived, copied, and distributed beyond control,” the authors noted, adding that once a model is saved on a laptop or local server, it is out of reach. In addition, they have found that the risk is compounded because attackers can use one model to create jailbreak prompts for another model.

Source link

How ‘dark LLMs’ produce harmful outputs, despite guardrails – Computerworld

Backup cloud PCs for the enterprise – Computerworld

Epic Walmart weekend sale live from $9 — shop deals on TVs, patio furniture and 4th of July essentials

Microsoft investigates OneDrive bug that breaks file search

3 Best Roofing Shingles of 2024, Lab-Tested and Reviewed

Hybrid cloud environments are under serious threat from hackers – here’s what you need to know

Netflix top 10 movies — here’s the 3 worth watching right now

Amazon improves Kindle accessibility with new text spacing adjustments

iOS 26 Makes It Way Easier to Set a Custom Ringtone on iPhone

Ransomware gang says it stole confidential files from Taos County, NM; demands ransom in 7 days

FBI disrupts the Dispossessor ransomware operation, seizes servers

Today’s AI models have a poor grasp of world history – Computerworld

Wordle Answer for Today, August 13, 2024

Related Articles